Collection Profiling for Collection Fusion in Distributed Information Retrieval Systems
نویسندگان
چکیده
Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalizing scores based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database and do not consider the retrieval performance. In this paper, we address the problem that in peer to peer information systems and argue that the performance of search engine should also be considered. We also proposed a collection profiling strategy which can discover not only collection content but also retrieval performance. Web-based query classification and two collection fusion approaches based on the collection profiling are also introduced in this paper. Our experiments show that our merging strategies are effective in merging results on uncooperative environment.
منابع مشابه
Distributed IR for Digital Libraries
This paper examines technology developed to support largescale distributed digital libraries. We describe the method used for harvesting collection information using standard information retrieval protocols and how this information is used in collection ranking and retrieval. The system that we have developed takes a probabilistic approach to distributed information retrieval using a Logistic r...
متن کاملComparison of different Collection Fusion Models in Distributed Information Retrieval
Distributed information retrieval comes into play when a user wants to get information from di erent sources in parallel. One of the challenges of this topic is the Collection Fusion problem: The distinct result lists of the underlying information retrieval systems (IR) have to be fused to give a global relevance-ranked result list according to the user's information need. In this paper several...
متن کاملReport on the TREC-5 Experiment: Data Fusion and Collection Fusion
This paper describes and evaluates a retrieval model that considers the problem of data fusion and collection fusion as two faces of the same coin. To establish a clear theoretical foundation for combining various sources of evidence provided either by different search schemes (data fusion) or by distributed information services (collection fusion), we have implemented a retrieval model based o...
متن کاملEnsuring Retrieval Effectiveness in Distributed Digital Libraries
• collection management; • organizing and indexing the materials for storage We find that dissemination of collection-wide information (CWI) in a distributed collection of documents is needed to and retrieval; achieve retrieval effectiveness comparable to that of a central• user interfaces and human-computer interaction; and ized collection. Complete dissemination is unnecessary. The • interope...
متن کاملLearning Collection FUsion Strategies for Information Retrieval
In this paper we describe an Information Retrieval problem called collection fusion. The collection fusion problem is to maximize the number of relevant natural language documents retrieved given: a natural language query, multiple collections of documents, and a fixed total number of documents to retrieve. We describe two algorithms that use past queries to learn collection fusion strategies. ...
متن کامل